Skip to content

Conversation

@koxudaxi
Copy link
Owner

@koxudaxi koxudaxi commented Jan 3, 2026

Fixes: #1955

Summary by CodeRabbit

  • New Features
    • Added support for scientific notation in YAML files without decimal points (e.g., 1e-5, 1E+10)
    • YAML parser now correctly interprets scientific notation as float values across positive and negative exponents

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 3, 2026

📝 Walkthrough

Walkthrough

Introduces YAML scientific notation pattern recognition and extends the CustomSafeLoader to properly resolve scientific notation values (e.g., 1e-5, 1E+10) as floats rather than strings during YAML parsing, fixing a bug where such numeric defaults were being cast to string types.

Changes

Cohort / File(s) Summary
YAML Parser Enhancement
src/datamodel_code_generator/util.py
Added _YAML_SCIENTIFIC_NOTATION_PATTERN regex to detect scientific notation without decimal points; extended CustomSafeLoader.get_safe_loader() to register implicit YAML resolvers for scientific notation as float type across sign and digit characters.
Test Fixture
tests/data/expected/main/yaml/scientific_notation.py
New generated Pydantic model with four optional float fields featuring scientific notation defaults: exponential_default (1e-05), positive_exp (20000000000.0), negative_prefix (-30000.0), with_decimal (1.5e-05).
Test Coverage
tests/main/test_main_yaml.py
Added test_main_yaml_scientific_notation() to validate YAML-to-Pydantic code generation with scientific notation input; verifies output matches expected model file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through YAML's den,
Where numbers danced in strange fashion then,
But now 1e-5 stays float, stays true,
No stringy tricks—science notation shines anew! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix YAML scientific notation parsing as float' directly matches the main change: introducing YAML scientific notation pattern recognition to parse such values as floats instead of strings.
Linked Issues check ✅ Passed The PR successfully addresses issue #1955 by introducing _YAML_SCIENTIFIC_NOTATION_PATTERN and extending CustomSafeLoader to treat scientific notation as floats, with test coverage demonstrating the fix works for values like 1e-5, 20000000000, -30000, and 1.5e-05.
Out of Scope Changes check ✅ Passed All changes directly support the objective of fixing YAML scientific notation parsing: the util.py modification adds the core fix, the test YAML file provides test data, the expected output file demonstrates correct parsing, and the test function validates the functionality.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c83470e and 14ca21d.

⛔ Files ignored due to path filters (1)
  • tests/data/yaml/scientific_notation.yaml is excluded by !tests/data/**/*.yaml and included by none
📒 Files selected for processing (3)
  • src/datamodel_code_generator/util.py
  • tests/data/expected/main/yaml/scientific_notation.py
  • tests/main/test_main_yaml.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/main/test_main_yaml.py (1)
tests/main/conftest.py (2)
  • output_file (99-101)
  • run_main_and_assert (245-409)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Analyze (python)
  • GitHub Check: py312-isort7 on Ubuntu
  • GitHub Check: 3.10 on Ubuntu
  • GitHub Check: py312-isort6 on Ubuntu
  • GitHub Check: 3.14 on Windows
  • GitHub Check: 3.10 on Windows
  • GitHub Check: 3.13 on Windows
  • GitHub Check: 3.12 on Windows
  • GitHub Check: 3.11 on Windows
  • GitHub Check: 3.14 on Ubuntu
  • GitHub Check: benchmarks
🔇 Additional comments (4)
src/datamodel_code_generator/util.py (2)

62-64: Excellent pattern definition for scientific notation.

The regex pattern correctly matches scientific notation without decimal points (e.g., 1e-5, 1E+10), addressing the core issue where such values were being parsed as strings. The pattern properly handles:

  • Optional leading signs (-, +)
  • Mantissa with optional underscores for readability (YAML 1.2 compatible)
  • Case-insensitive exponent marker (e, E)
  • Signed exponents

The explanatory comments clearly document the purpose and the gap in standard YAML parsing.


110-116: Scientific notation pattern registration is correct.

The pattern is correctly registered for all relevant starting characters (-, +, and digits 0-9) and properly tagged as a YAML float. The implementation successfully resolves scientific notation without decimal points (like 1e-5) as float values, addressing the issue where standard YAML treats these as strings.

tests/main/test_main_yaml.py (1)

78-91: Well-structured test for scientific notation handling.

This test effectively verifies the fix for issue #1955. The test:

  • Uses jsonschema input type, appropriate for testing default value parsing
  • Explicitly specifies pydantic_v2.BaseModel output to ensure consistent behavior
  • Includes a clear docstring explaining the expected behavior (scientific notation parsed as float, not string)
  • Follows the established testing pattern in this file
tests/data/expected/main/yaml/scientific_notation.py (1)

10-14: Expected output correctly demonstrates the fix.

This expected output validates that scientific notation values are now parsed as float literals rather than strings:

  • exponential_default: float | None = 1e-05 - scientific notation preserved as float literal ✓
  • positive_exp: float | None = 20000000000.0 - large value in decimal form (Python's repr choice) ✓
  • negative_prefix: float | None = -30000.0 - negative value in decimal form ✓
  • with_decimal: float | None = 1.5e-05 - scientific notation with decimal point preserved ✓

Before the fix, these would have been generated as strings like '1e-05'. The fix ensures they remain numeric literals, which is the correct behavior per issue #1955.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2026

📚 Docs Preview: https://pr-2913.datamodel-code-generator.pages.dev

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 3, 2026

CodSpeed Performance Report

Merging #2913 will degrade performance by 17.71%

Comparing fix/yaml-scientific-notation-1955 (14ca21d) with main (c83470e)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

❌ 11 regressions
⏩ 98 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Mode Benchmark BASE HEAD Efficiency
WallTime test_perf_graphql_style_pydantic_v2 695.7 ms 829.1 ms -16.09%
WallTime test_perf_duplicate_names 839.6 ms 1,020.3 ms -17.71%
WallTime test_perf_aws_style_openapi_pydantic_v2 1.6 s 2 s -15.53%
WallTime test_perf_openapi_large 2.5 s 2.9 s -15.29%
WallTime test_perf_all_options_enabled 5.7 s 6.8 s -15.21%
WallTime test_perf_kubernetes_style_pydantic_v2 2.2 s 2.6 s -15.95%
WallTime test_perf_stripe_style_pydantic_v2 1.7 s 2 s -15.63%
WallTime test_perf_multiple_files_input 3.1 s 3.8 s -17.56%
WallTime test_perf_deep_nested 5.1 s 6.1 s -15.71%
WallTime test_perf_large_models_pydantic_v2 3.1 s 3.7 s -16.98%
WallTime test_perf_complex_refs 1.7 s 2 s -16.78%

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link

codecov bot commented Jan 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.40%. Comparing base (a310b6f) to head (14ca21d).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2913      +/-   ##
==========================================
+ Coverage   99.38%   99.40%   +0.02%     
==========================================
  Files          92       95       +3     
  Lines       16342    16910     +568     
  Branches     1934     1991      +57     
==========================================
+ Hits        16241    16809     +568     
  Misses         52       52              
  Partials       49       49              
Flag Coverage Δ
unittests 99.40% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@koxudaxi koxudaxi merged commit b5a361d into main Jan 3, 2026
37 of 38 checks passed
@koxudaxi koxudaxi deleted the fix/yaml-scientific-notation-1955 branch January 3, 2026 15:42
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2026

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: This PR is a bug fix that corrects incorrect YAML parsing behavior. Previously, scientific notation without decimal points (e.g., 1e-5, 1E+10) was incorrectly parsed as strings instead of floats. The fix makes the YAML parser correctly recognize these as float values per YAML specification. While the generated code output will change for users who have YAML files with such scientific notation defaults, this change is from incorrect behavior to correct behavior - not a breaking change in the conventional sense. No API, CLI, templates, defaults, or Python version support were modified.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2026

🎉 Released in 0.52.1

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Casts default values of type number (scientific notation) to str

2 participants